A weighted U-statistic for genetic association analyses of sequencing data.

نویسندگان

  • Changshuai Wei
  • Ming Li
  • Zihuai He
  • Olga Vsevolozhskaya
  • Daniel J Schaid
  • Qing Lu
چکیده

With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Genetic Variants for Addiction via Propensity Score Adjusted Generalized Kendall's Tau.

Identifying replicable genetic variants for addiction has been extremely challenging. Besides the common difficulties with genome-wide association studies (GWAS), environmental factors are known to be critical to addiction, and comorbidity is widely observed. Despite the importance of environmental factors and comorbidity for addiction study, few GWAS analyses adequately considered them due to ...

متن کامل

Genome-wide joint analysis of single-nucleotide variant sets and gene expression for hypertension and related phenotypes

BACKGROUND With the advance of next-generation sequencing technologies, the study of rare variants in targeted genome regions or even the whole genome becomes feasible. Nevertheless, the massive amount of sequencing data brings great computational and statistical challenges for association analyses. Aside from sequencing variants, other high-throughput omic data (eg, gene expression data) also ...

متن کامل

Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls

Next-generation sequencing of DNA provides an unprecedented opportunity to discover rare genetic variants associated with complex diseases and traits. However, the common practice of first calling underlying genotypes and then treating the called values as known is prone to false positive findings, especially when genotyping errors are systematically different between cases and controls. This h...

متن کامل

Fuzzy Programming for Parallel Machines Scheduling: Minimizing Weighted Tardiness/Earliness and Flow Time through Genetic Algorithm

Appropriate scheduling and sequencing of tasks on machines is one of the basic and significant problems that a shop or a factory manager encounters; this is why in recent decades extensive studies have been done on scheduling issues. One type of scheduling problems is just-in-time (JIT) scheduling and in this area, motivated by JIT manufacturing, this study investigates a mathematical model for...

متن کامل

Fuzzy Programming for Parallel Machines Scheduling: Minimizing Weighted Tardiness/Earliness and Flowtime through Genetic Algorithm

Appropriate scheduling and sequencing of tasks on machines is one of the basic and significant problems that a shop or a factory manager encounters with it, this is why in recent decades extensive researches have been done on scheduling issues. A type of scheduling problems is just-in-time (JIT) scheduling and in this area, motivated by JIT manufacturing, this study investigates a mathematical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genetic epidemiology

دوره 38 8  شماره 

صفحات  -

تاریخ انتشار 2014